29 research outputs found
Artificial Intelligence in the Creative Industries: A Review
This paper reviews the current state of the art in Artificial Intelligence
(AI) technologies and applications in the context of the creative industries. A
brief background of AI, and specifically Machine Learning (ML) algorithms, is
provided including Convolutional Neural Network (CNNs), Generative Adversarial
Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement
Learning (DRL). We categorise creative applications into five groups related to
how AI technologies are used: i) content creation, ii) information analysis,
iii) content enhancement and post production workflows, iv) information
extraction and enhancement, and v) data compression. We critically examine the
successes and limitations of this rapidly advancing technology in each of these
areas. We further differentiate between the use of AI as a creative tool and
its potential as a creator in its own right. We foresee that, in the near
future, machine learning-based AI will be adopted widely as a tool or
collaborative assistant for creativity. In contrast, we observe that the
successes of machine learning in domains with fewer constraints, where AI is
the `creator', remain modest. The potential of AI (or its developers) to win
awards for its original creations in competition with human creatives is also
limited, based on contemporary technologies. We therefore conclude that, in the
context of creative industries, maximum benefit from AI will be derived where
its focus is human centric -- where it is designed to augment, rather than
replace, human creativity
Image Fusion via Sparse Regularization with Non-Convex Penalties
The L1 norm regularized least squares method is often used for finding sparse
approximate solutions and is widely used in 1-D signal restoration. Basis
pursuit denoising (BPD) performs noise reduction in this way. However, the
shortcoming of using L1 norm regularization is the underestimation of the true
solution. Recently, a class of non-convex penalties have been proposed to
improve this situation. This kind of penalty function is non-convex itself, but
preserves the convexity property of the whole cost function. This approach has
been confirmed to offer good performance in 1-D signal denoising. This paper
demonstrates the aforementioned method to 2-D signals (images) and applies it
to multisensor image fusion. The problem is posed as an inverse one and a
corresponding cost function is judiciously designed to include two data
attachment terms. The whole cost function is proved to be convex upon suitably
choosing the non-convex penalty, so that the cost function minimization can be
tackled by convex optimization approaches, which comprise simple computations.
The performance of the proposed method is benchmarked against a number of
state-of-the-art image fusion techniques and superior performance is
demonstrated both visually and in terms of various assessment measures
Object recognition in atmospheric turbulence scenes
The influence of atmospheric turbulence on acquired surveillance imagery
poses significant challenges in image interpretation and scene analysis.
Conventional approaches for target classification and tracking are less
effective under such conditions. While deep-learning-based object detection
methods have shown great success in normal conditions, they cannot be directly
applied to atmospheric turbulence sequences. In this paper, we propose a novel
framework that learns distorted features to detect and classify object types in
turbulent environments. Specifically, we utilise deformable convolutions to
handle spatial turbulent displacement. Features are extracted using a feature
pyramid network, and Faster R-CNN is employed as the object detector.
Experimental results on a synthetic VOC dataset demonstrate that the proposed
framework outperforms the benchmark with a mean Average Precision (mAP) score
exceeding 30%. Additionally, subjective results on real data show significant
improvement in performance
Towards a Robust Framework for NeRF Evaluation
Neural Radiance Field (NeRF) research has attracted significant attention
recently, with 3D modelling, virtual/augmented reality, and visual effects
driving its application. While current NeRF implementations can produce high
quality visual results, there is a conspicuous lack of reliable methods for
evaluating them. Conventional image quality assessment methods and analytical
metrics (e.g. PSNR, SSIM, LPIPS etc.) only provide approximate indicators of
performance since they generalise the ability of the entire NeRF pipeline.
Hence, in this paper, we propose a new test framework which isolates the neural
rendering network from the NeRF pipeline and then performs a parametric
evaluation by training and evaluating the NeRF on an explicit radiance field
representation. We also introduce a configurable approach for generating
representations specifically for evaluation purposes. This employs ray-casting
to transform mesh models into explicit NeRF samples, as well as to "shade"
these representations. Combining these two approaches, we demonstrate how
different "tasks" (scenes with different visual effects or learning strategies)
and types of networks (NeRFs and depth-wise implicit neural representations
(INRs)) can be evaluated within this framework. Additionally, we propose a
novel metric to measure task complexity of the framework which accounts for the
visual parameters and the distribution of the spatial data. Our approach offers
the potential to create a comparative objective evaluation framework for NeRF
methods.Comment: 9 pages, 4 experiment
Unsupervised Image Fusion Using Deep Image Priors
A significant number of researchers have applied deep learning methods to
image fusion. However, most works require a large amount of training data or
depend on pre-trained models or frameworks to capture features from source
images. This is inevitably hampered by a shortage of training data or a
mismatch between the framework and the actual problem. Deep Image Prior (DIP)
has been introduced to exploit convolutional neural networks' ability to
synthesize the 'prior' in the input image. However, the original design of DIP
is hard to be generalized to multi-image processing problems, particularly for
image fusion. Therefore, we propose a new image fusion technique that extends
DIP to fusion tasks formulated as inverse problems. Additionally, we apply a
multi-channel approach to enhance DIP's effect further. The evaluation is
conducted with several commonly used image fusion assessment metrics. The
results are compared with state-of-the-art image fusion methods. Our method
outperforms these techniques for a range of metrics. In particular, it is shown
to provide the best objective results for most metrics when applied to medical
images
Optimal Transport-based Graph Matching for 3D retinal OCT image registration
Registration of longitudinal optical coherence tomography (OCT) images
assists disease monitoring and is essential in image fusion applications. Mouse
retinal OCT images are often collected for longitudinal study of eye disease
models such as uveitis, but their quality is often poor compared with human
imaging. This paper presents a novel but efficient framework involving an
optimal transport based graph matching (OT-GM) method for 3D mouse OCT image
registration. We first perform registration of fundus-like images obtained by
projecting all b-scans of a volume on a plane orthogonal to them, hereafter
referred to as the x-y plane. We introduce Adaptive Weighted Vessel Graph
Descriptors (AWVGD) and 3D Cube Descriptors (CD) to identify the correspondence
between nodes of graphs extracted from segmented vessels within the OCT
projection images. The AWVGD comprises scaling, translation and rotation, which
are computationally efficient, whereas CD exploits 3D spatial and frequency
domain information. The OT-GM method subsequently performs the correct
alignment in the x-y plane. Finally, registration along the direction
orthogonal to the x-y plane (the z-direction) is guided by the segmentation of
two important anatomical features peculiar to mouse b-scans, the Internal
Limiting Membrane (ILM) and the hyaloid remnant (HR). Both subjective and
objective evaluation results demonstrate that our framework outperforms other
well-established methods on mouse OCT images within a reasonable execution
time